Introduction

This document presents the work done for Assignment 3 in the Statistical Learning course at Universidad Carlos III de Madrid. The objective of this assignment is to implement the Expectation-Maximization (EM) algorithm for estimating the parameters of a Gaussian Mixture Model (GMM) across different data dimensions.

Specifically, the assignment requires:

This report details the methodology, results, and insights gained from the implementation and testing of the EM algorithm. # Functions ## Initialization function Initialization is the first step in the EM algorithm and is crucial because convergence depends heavily on its setup, while the quality of the initial clusters influences the final solution. In this implementation, we initialize the cluster means by randomly selecting KK data points, ensuring that the means start within a reasonable range. The covariance matrices are set to identity matrices to maintain numerical stability and prevent singularities. The mixture coefficients are uniformly initialized to ensure that no cluster dominates the process at the start. This method provides a simple and effective way to initialize EM without relying on K-means, making the procedure fully autonomous.

initialize_em_parameters = function(X, K) {
  N = nrow(X)  # Data rows
  D = ncol(X)  # Data columns
  
  # Randomly select K data points as initial means
  mu = X[sample(1:N, K), ]
  
  # Initialize covariance matrices as identity matrices
  sigma = array(diag(D), dim = c(D, D, K))
  
  # Initialize equal mixing coefficients
  pi_k = rep(1/K, K)
  
  return(list(mu = mu, sigma = sigma, pi_k = pi_k))
}

Plot 1D, 2Dand 3D functions

Generate plot for 1D and 2D image

# Function to compute 2D Gaussian ellipses
get_ellipse = function(mu, sigma, npoints = 100) {
  angles = seq(0, 2 * pi, length.out = npoints)
  circle = cbind(cos(angles), sin(angles))  # Unit circle
  chol_decomp = chol(sigma)  # Cholesky decomposition of covariance
  ellipse_points = circle %*% t(chol_decomp)  # Transform to Gaussian shape
  ellipse_points = sweep(ellipse_points, 2, mu, "+")  # Shift to mean
  return(as.data.frame(ellipse_points))
}

# Function to plot 1D, 2D, and 3D GMM results
plot_1D_2D_3D = function(X, gamma, mu, sigma) {
  D = ncol(X)  # Dimension of data (1D, 2D, or 3D)
  K = nrow(as.matrix(mu))  # Ensure mu is a matrix
  cluster_labels = apply(gamma, 1, which.max)  # Assign each point to a cluster
  
  data_plot = as.data.frame(X)
  data_plot$cluster = as.factor(cluster_labels)  # Convert to categorical
  
  # ---- 1D Case ----
  if (D == 1) {
    p = ggplot(data_plot, aes(x = V1, fill = cluster)) +
      geom_histogram(alpha = 0.5, bins = 30, position = "identity") +
      geom_vline(xintercept = as.vector(mu), col = "black", linetype = "dashed", size = 1) +
      labs(title = "1D Gaussian Mixture Clustering", x = "Value", y = "Frequency") +
      theme_minimal()
    return(p)
    
  # ---- 2D Case ----
  } else if (D == 2) {
    p = ggplot(data_plot, aes(x = V1, y = V2, color = cluster)) +
      geom_point(alpha = 0.6) +  # Data points
      geom_point(data = as.data.frame(mu), aes(x = V1, y = V2), 
                 color = "black", size = 4, shape = 4) +  # Cluster means
    
      # ---- Add Gaussian Ellipses ----
      lapply(1:K, function(k) {
        ellipse_data = get_ellipse(mu[k,], sigma[,,k])
        geom_path(data = ellipse_data, aes(x = V1, y = V2), color = "black", linetype = "dashed")
      }) +
      geom_density_2d()+
      
      labs(title = "2D Gaussian Mixture Clustering", x = "X", y = "Y") +
      theme_minimal()
    return(p)
    
  # ---- 3D Case ----
  } else if (D == 3) {
    # 3D Plot using plotly (same as before)
    plot_3D = plot_ly() %>%
      add_trace(
        data = data_plot, 
        x = ~V1, y = ~V2, z = ~V3, 
        type = "scatter3d", mode = "markers",
        marker = list(size = 3, opacity = 0.7),
        color = ~cluster  # Ensure color length matches points
      ) %>%
      add_trace(
        data = as.data.frame(mu), 
        x = ~V1, y = ~V2, z = ~V3, 
        type = "scatter3d", mode = "markers",
        marker = list(size = 6, color = "black", symbol = "x")
      ) %>%
      layout(
        title = "3D Gaussian Mixture Clustering",
        scene = list(xaxis = list(title = "X"), 
                     yaxis = list(title = "Y"), 
                     zaxis = list(title = "Z"))
      )
    
    return(plot_3D)
    
  } else {
    stop("This function only supports 1D, 2D, and 3D data!")
  }
}

Plot 3D or more function (for images)

reconstruct_image = function(image_path, em_result, output_path = "segmented_image.jpg") {
  # Load the original image
  img = readJPEG(image_path)
  dim_img = dim(img)
  
  # Get cluster assignments (each pixel assigned to a cluster)
  clusters = apply(em_result$gamma, 1, which.max)
  
  # Reconstruct the image using the cluster means
  segmented_matrix = em_result$mu[clusters, ]
  
  # Reshape back into the original image format
  segmented_img = array(segmented_matrix, dim = dim_img)
  
  # **Fix orientation issues**
  segmented_img = aperm(segmented_img, c(2, 1, 3))  # Swap rows and columns
  
  # Save the corrected segmented image
  writeJPEG(segmented_img, output_path)
  
  # Display the corrected segmented image
  plot(as.cimg(segmented_img))
  
  # Prevent automatic printing
  invisible(segmented_img)
}

function for log likelihood convergency

plot_log_likelihood = function(log_likelihood_values) {
  iterations = 1:length(log_likelihood_values)
  log_likelihood_df = data.frame(Iteration = iterations, LogLikelihood = log_likelihood_values)
  
  ggplot(log_likelihood_df, aes(x = Iteration, y = LogLikelihood)) +
    geom_line(color = "blue", size = 1) +
    geom_point(color = "red", size = 2) +
    labs(title = "EM Algorithm Log-Likelihood Convergence",
         x = "Iteration",
         y = "Log-Likelihood") +
    theme_minimal()
}

EM function

The EM function was implemented following the algorithm described in Pattern Recognition and Machine Learning by Christopher Bishop (2006, Chapter 9). The Expectation-Maximization (EM) algorithm iteratively estimates the parameters of a Gaussian Mixture Model (GMM) by maximizing the likelihood function. In the E-step, it computes the responsibilities \(\gamma_{ik}\), which represent the probability that each data point \(\mathbf{x}_i\) belongs to cluster \(k\):

\[ \gamma_{ik} = \frac{\pi_k \mathcal{N}(\mathbf{x}_i | \mathbf{\mu}_k, \mathbf{\Sigma}_k)} {\sum_{j=1}^{K} \pi_j \mathcal{N}(\mathbf{x}_i | \mathbf{\mu}_j, \mathbf{\Sigma}_j)} \]

where \(\pi_k\) is the mixing coefficient, \(\mathbf{\mu}_k\) is the mean, and \(\mathbf{\Sigma}_k\) is the covariance matrix of cluster \(k\). In the M-step, the parameters are updated using the computed responsibilities. The new cluster means are estimated as:

\[ \mathbf{\mu}_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} \mathbf{x}_i \]

where \(N_k = \sum_{i=1}^{N} \gamma_{ik}\) represents the effective number of points assigned to cluster \(k\). The covariance matrices are updated as:

\[ \mathbf{\Sigma}_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} (\mathbf{x}_i - \mathbf{\mu}_k)(\mathbf{x}_i - \mathbf{\mu}_k)^T + \epsilon I \]

where \(\epsilon I\) is a small regularization term added for numerical stability, as suggested in Bishop (2006, Section 9.2.2). The algorithm iterates until the log-likelihood function:

\[ \log L = \sum_{i=1}^{N} \log \left( \sum_{k=1}^{K} \pi_k \mathcal{N}(\mathbf{x}_i | \mathbf{\mu}_k, \mathbf{\Sigma}_k) \right) \]

converges. The function ensures an optimal and numerically stable implementation of EM for 1D, 2D, and high-dimensional data, including image segmentation. ## Function of EM for 1D

EM_GMM_1D = function(X, K, max_iter = 100, tol = 1e-6) {
  N = length(X)  # n data
  
  set.seed(234) 
  mu = runif(K, min(X), max(X))  # Random means
  sigma = rep(var(X), K)  # Initialize  sample variance
  pi_k = rep(1/K, K)  # Equal mixture weights initially
  
  gamma = matrix(0, N, K)  # Responsibility matrix
  log_likelihood_values = numeric(max_iter)  # Store log-likelihood values
  
  log_likelihood = function() {
    sum(log(rowSums(sapply(1:K, function(k) {
      pi_k[k] * dnorm(X, mean = mu[k], sd = sqrt(sigma[k]))
    }))))
  }
  
  logL = log_likelihood()
  log_likelihood_values[1] = logL
  
  for (iter in 2:max_iter) {
    # E-step: Compute responsibilities
    for (k in 1:K) {
      gamma[, k] = pi_k[k] * dnorm(X, mean = mu[k], sd = sqrt(sigma[k]))
    }
    gamma = gamma / rowSums(gamma)
  

    
    # M-step: Update parameters
    Nk = colSums(gamma)  # Effective number of points in each cluster
    pi_k = Nk / N  # Update mixing coefficients
    mu = colSums(gamma * X) / Nk  # Update means
    sigma = sapply(1:K, function(k) sum(gamma[, k] * (X - mu[k])^2) / Nk[k])
    
    
    # Compute new log-likelihood
    new_logL = log_likelihood()
    log_likelihood_values[iter] = new_logL
    
    # Check for convergence
    if (abs(new_logL - logL) < tol) {
      message("Converged at iteration ", iter)
      log_likelihood_values = log_likelihood_values[1:iter]
      break
    }
    logL = new_logL
  }
  
  return(list(mu = mu, sigma = sigma, pi_k = pi_k, gamma = gamma, log_likelihood = log_likelihood_values))
}
EM_GMM = function(X, K, max_iter = 100, tol = 1e-6) {
  # Check if X is a single-column matrix (1D case)
  if (ncol(X) == 1) {
    print("1D detected, calling EM_GMM_1D")
    return(EM_GMM_1D(as.vector(X), K, max_iter, tol))  # Convert to vector for 1D function
  }
  
  N = nrow(X)  # Number of data points
  D = ncol(X)  # Data dimensions
  
  
  params = initialize_em_parameters(X, K)
  mu = params$mu
  sigma = params$sigma
  pi_k = params$pi_k
  
  gamma = matrix(0, N, K)  # Responsibility matrix
  log_likelihood_values = numeric(max_iter)  # Store log-likelihood values
  
  X = scale(X)
  mu = scale(mu)
  
  log_likelihood = function() {
    sum(log(rowSums(sapply(1:K, function(k) {
      pi_k[k] * dmvnorm(X, mean = as.numeric(mu[k, , drop = FALSE]), sigma = sigma[, , k])
    }))))
  }
  
  logL = log_likelihood()
  log_likelihood_values[1] = logL  
  
  
  for (iter in 2:max_iter) {
    for (k in 1:K) {
      gamma[, k] = pi_k[k] * dmvnorm(X, mean = as.numeric(mu[k, , drop = FALSE]), sigma = sigma[, , k])
    }
    gamma = gamma / rowSums(gamma)
    
    Nk = colSums(gamma)
    pi_k = Nk / N
    mu = matrix(t(gamma) %*% X / Nk, ncol = D, byrow = TRUE)
    
    if (K == 1) {
      mu = matrix(mu, nrow = 1, ncol = D)
    }
    
    for (k in 1:K) {
      X_centered = sweep(X, 2, mu[k, , drop = FALSE], FUN = "-")
      sigma[, , k] = t(X_centered) %*% (X_centered * gamma[, k]) / Nk[k] + diag(1e-6, D)
    }
    
    new_logL = log_likelihood()
    log_likelihood_values[iter] = new_logL  
    
    if (abs(new_logL - logL) < tol) {
      message("Converged at iteration ", iter)
      log_likelihood_values = log_likelihood_values[1:iter]  
      break
    }
    logL = new_logL
  }
  

  return(list(mu = mu, sigma = sigma, pi_k = pi_k, gamma = gamma, log_likelihood = log_likelihood_values))
}

Test for different dimensions

1D

The first test is 1D this is a vector with 2 groups within the same vector.

set.seed(123)
X_1D = matrix(c(rnorm(100, mean = -2, sd = 1), 
                 rnorm(100, mean = 3, sd = 1.5)), ncol = 1)

# Run EM Algorithm for 1D data
result_1D = EM_GMM(X_1D, K = 2)
## [1] "1D detected, calling EM_GMM_1D"
## Converged at iteration 59
# Corrected function call
plot_1D_2D_3D(X_1D, result_1D$gamma, result_1D$mu, result_1D$sigma)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

plot_log_likelihood(result_1D$log_likelihood)

2D

We are testing this within the matrix with multivariable data of 2 different population

# Generate synthetic 2D data
set.seed(123)
X_2D = rbind(
  mvrnorm(n = 100, mu = c(-2, -2), Sigma = matrix(c(1, 0.5, 0.5, 1), ncol = 2)),
  mvrnorm(n = 100, mu = c(3, 3), Sigma = matrix(c(1, -0.3, -0.3, 1), ncol = 2))
)

# Run EM Algorithm for 2D data
result_2D = EM_GMM(X_2D, K = 2)
## Converged at iteration 11
# Plot the clustering results
plot_1D_2D_3D(scale(X_2D), result_2D$gamma, t(result_2D$mu), result_2D$sigma)

plot_log_likelihood(result_2D$log_likelihood)

#3D

true_means = matrix(c(5, 5, 5,  -5, -5, -5,  5, -5, 5), ncol = 3, byrow = TRUE)
true_covariances = list(
  diag(c(2, 1, 1)),  # Cluster 1
  diag(c(1, 2, 1)),  # Cluster 2
  diag(c(1, 1, 2))   # Cluster 3
)

# Generate data from 3 different Gaussians
X_list = lapply(1:3, function(k) {
  mvrnorm(n = 300 / 3, mu = true_means[k,], Sigma = true_covariances[[k]])
})

# Combine all generated data
X_3D = do.call(rbind, X_list)
colnames(X_3D) = c("V1", "V2", "V3")
em_results = EM_GMM(X_3D, K = 3)
## Converged at iteration 12
# Extract fitted parameters
gamma_est = em_results$gamma
mu_est = em_results$mu
sigma_est = em_results$sigma

# -----------------------
# STEP 3: Plot using plot_GMM()
# -----------------------
plot_1D_2D_3D(scale(X_3D), gamma_est, t(mu_est), sigma_est)

Image Segmentation Results

The images below show how the Expectation-Maximization (EM) algorithm groups pixels with similar colors to segment different parts of an image.

Since EM models the pixel distribution using a Gaussian Mixture Model (GMM), it clusters pixels based on their RGB values. Here are some key takeaways:

  • It works well when colors are clearly distinct, effectively separating different regions.
  • However, it struggles with gradual transitions, like shadows or soft gradients, leading to blurry or inaccurate segmentation.
  • In some areas, EM introduces noise by splitting similar colors into different clusters when they should belong to the same object.

Overall, EM can be useful for image segmentation, but its effectiveness depends on how well colors form separate groups. When boundaries between colors are less defined, EM may not perform as expected.

# Load and reshape the image
img = readJPEG("imagen_assignment2/20BT.jpg")
dim_img = dim(img)

# Reshape into N × 3 (RGB) matrix
img_reshaped = cbind(as.vector(img[,,1]), as.vector(img[,,2]), as.vector(img[,,3]))

# Apply EM algorithm
result = EM_GMM(img_reshaped, K = 3)

reconstruct_image("C:/Users/flore/Desktop/git/Statistical_learning/Test_alpha/20BT.jpg",result)

# Load and reshape the image
img2 = readJPEG("imagen_assignment2/Melanoma.jpg")
dim_img = dim(img2)

# Reshape into N × 3 (RGB) matrix
img_reshaped2 = cbind(as.vector(img2[,,1]), as.vector(img2[,,2]), as.vector(img2[,,3]))

# Apply EM algorithm
result2 = EM_GMM(img_reshaped2, K = 2)
## Converged at iteration 65
reconstruct_image("imagen_assignment2/Melanoma.jpg",result2)
## Warning in as.cimg.array(segmented_img): Assuming third dimension corresponds
## to colour

# Load and reshape the image

img2 = readJPEG("imagen_assignment2/test_1.jpeg")
dim_img = dim(img2)

# Reshape into N × 3 (RGB) matrix
img_reshaped2 = cbind(as.vector(img2[,,1]), as.vector(img2[,,2]), as.vector(img2[,,3]))

# Apply EM algorithm
result2 = EM_GMM(img_reshaped2, K = 5)

reconstruct_image("imagen_assignment2/test_1.jpeg",result2)

## Classificacion with EM

In this part we are trying to see if the EM algorithm works for classification. First doing PCA and then apply the EM to create classification.

### Read Data functions
read_all_images = function(folder_path) {
  image_files = list.files(folder_path, full.names = TRUE, pattern = "\\.(jpg|png|jpeg|tiff|bmp)$", ignore.case = TRUE)
  
  # Sort filenames
  image_files = mixedsort(image_files)  
  
  # Extract labels from filenames
  extract_label = function(filename) {
    base_name = tools::file_path_sans_ext(basename(filename))  
    label = gsub("[^0-9]", "", base_name)  # Extract numeric part
    return(label)
  }
  extract_filename = function(filepath) {
    return(basename(filepath))
  }
  read_data = function(image_path) {
    img = readImage(image_path)  
    red_aux   = as.vector(img[,,1])
    green_aux = as.vector(img[,,2])
    blue_aux  = as.vector(img[,,3])
    
    # Combine all channels into a single row
    img_vector = c(red_aux, green_aux, blue_aux)
    
    return(as.data.frame(t(img_vector)))  # Transpose to make it a row
  }
  
  image_list = lapply(image_files, function(file) {
    img_data = read_data(file)  
    img_data$Label = extract_label(file)
    img_data$ID = extract_filename(file) 
    return(img_data)
  })
  
  ax = do.call(rbind, image_list)
  return(ax)
}
### PCA Function
PCA.fun = function(X,matrix_bool = F){
  if (!matrix_bool){
    X = X %>% dplyr::select(-c("Label","ID"))
    print(dim(X))
    X = as.matrix(X)
  } else{
    X = X[,-ncol(X)]
    print(dim(X))
  }
  
  n = nrow(X)
  mu = colMeans(X)
  # center data
  G = sweep(X, 2, mu, FUN = "-")
  # Compute the covariance matrix
  cov_matrix =(1/(n-1))*(G %*% t(G))
  ## Calculate Eigenvalues and Eigenvectors and sort
  eig = eigen(cov_matrix)
  eig_val = eig$values
  eig_vec = eig$vectors
  ei_vec_large = t(G)%*%eig_vec
  # sort
  sort_index = order(eig_val, decreasing = TRUE)
  eig_val_sorted = eig_val[sort_index]
  eig_vec_sorted = ei_vec_large[, sort_index] 
  #variability of each PC
  D = eig_val_sorted/sum(eig_val_sorted)
  return(list("Eigen Vector"= eig_vec_sorted,"D"=D))
  }
### load data
folder_path = "Training_alpha"
data = read_all_images(folder_path)

labels = data$Label  
image_names = data$ID

# turn into matrix for some application
data_asmatrix  = as.matrix(data %>% dplyr::select(-ID))
data_asmatrix = apply(data_asmatrix, 2, as.numeric)
PCA_result = PCA.fun(data)
## [1]     12 108000
eig_vecs = PCA_result$"Eigen Vector"
cumulative_variance = cumsum(PCA_result$D)
num_components = which(cumulative_variance >= 0.95)[1]

# Project data onto selected PCs
data_reduced_PCA = data_asmatrix[,-ncol(data_asmatrix)] %*% eig_vecs[, 1:num_components]
data_reduced_PCA = as.data.frame(apply(data_reduced_PCA, 2, as.numeric))
data_reduced_PCA_labeled = cbind(data_reduced_PCA, labels) 
set.seed(123)
em_pca_results = EM_GMM(as.matrix(data_reduced_PCA), K = 2)
## Converged at iteration 8
labels = apply(em_pca_results$gamma, 1, function(row) which.max(row))
data_reduced_PCA_labeled$EM_labels = as.factor(labels)
data_reduced_PCA_labeled
##            V1         V2         V3 labels EM_labels
## 1  -10382.270  912.81077  735.82619      1         2
## 2  -10234.995  914.03785  761.83617      1         2
## 3  -11250.989  257.37601 -254.62464      1         1
## 4  -11156.964  336.70711 -163.18099      1         1
## 5  -11236.863  255.30750 -134.01965      1         1
## 6  -11352.240  304.26472 -172.13487      1         1
## 7    8950.352  118.26988  263.67167      2         2
## 8    9026.702 -186.90475  378.53420      2         2
## 9    8895.656   15.43497  325.94968      2         2
## 10   8548.154  816.27236  -45.80203      2         2
## 11   7832.108 1085.73597 -195.14015      2         2
## 12   7351.097 1252.83305 -156.06647      2         2
# Scale V1, V2, and V3
data_reduced_PCA_labeled_scaled = data_reduced_PCA_labeled
data_reduced_PCA_labeled_scaled[, c("V1", "V2", "V3")] = scale(data_reduced_PCA_labeled[, c("V1", "V2", "V3")])


# Create 3D plot using plotly
plot_ly(data_reduced_PCA_labeled_scaled, 
        x = ~V1, 
        y = ~V2, 
        z = ~V3, 
        color = ~labels, 
        colors = c("blue", "red"), 
        type = 'scatter3d', 
        mode = 'markers', 
        marker = list(size = 5, opacity = 0.7)) %>%
  layout(title = "3D PCA-Reduced Data Visualization", 
         scene = list(xaxis = list(title = 'V1'),
                      yaxis = list(title = 'V2'),
                      zaxis = list(title = 'V3')))
# Generate random samples from both Gaussians
set.seed(42)
samples1 = mvrnorm(n = 100, mu = em_pca_results$mu[1,], Sigma = em_pca_results$sigma[,,1])
samples2 = mvrnorm(n = 100, mu = em_pca_results$mu[2,], Sigma = em_pca_results$sigma[,,2])

# Create a dataframe with both Gaussian distributions
data_gaussians = data.frame(
  V1 = c(samples1[,1], samples2[,1]),
  V2 = c(samples1[,2], samples2[,2]),
  V3 = c(samples1[,3], samples2[,3]),
  labels = factor(c(rep(1, 100), rep(2, 100)))  # Labels for the two distributions
)

# Plot the Gaussians in 3D using V1, V2, and V3
plot_ly() %>%
  # Plot the first Gaussian (Gaussian 1)
  add_trace(data = data_gaussians[data_gaussians$labels == 1, ], 
            x = ~V1, y = ~V2, z = ~V3, 
            color = I("darkblue"), type = 'scatter3d', 
            mode = 'markers', marker = list(size = 6, opacity = 0.5),
            name = "Samples from EM Gaussian 1") %>%
  # Plot the second Gaussian (Gaussian 2)
  add_trace(data = data_gaussians[data_gaussians$labels == 2, ], 
            x = ~V1, y = ~V2, z = ~V3, 
            color = I("darkred"), type = 'scatter3d', 
            mode = 'markers', marker = list(size = 6, opacity = 0.5),
            name = "Samples from EM Gaussian 2") %>%
  add_trace(data = data_reduced_PCA_labeled_scaled[data_reduced_PCA_labeled_scaled$labels == 1, ], 
            x = ~V1, y = ~V2, z = ~V3, 
            color = I("orange"), type = 'scatter3d', 
            mode = 'markers', marker = list(size = 6, opacity = 0.5),
            name = "Image Points True ID = 1") %>%
  add_trace(data = data_reduced_PCA_labeled_scaled[data_reduced_PCA_labeled_scaled$labels == 2, ], 
            x = ~V1, y = ~V2, z = ~V3, 
            color = I("purple"), type = 'scatter3d', 
            mode = 'markers', marker = list(size = 6, opacity = 0.5),
            name = "Image Points True ID = 2") %>%
  layout(title = "3D Gaussian Distributions",
         scene = list(
           xaxis = list(title = 'V1'),
           yaxis = list(title = 'V2'),
           zaxis = list(title = 'V3')
         ))

Expectation-Maximization (EM) is a good algorithm for clustering but does not perform well when applied to classification. The primary problem is that EM clusters data points without knowing their true class labels, so one cluster might consist of more than one true classes and hence results in misclassification. Moreover, EM works on the assumption that data is distributed according to a Gaussian distribution, while real-world datasets usually possess intricate structures which are not captured by EM. As it only clusters data by similarity of features instead of learning class boundaries, its clusters might not correspond to the actual categories. Although EM is helpful in identifying patterns among unlabeled data, it is not a good way of classification unless supplemented with further steps in order to label its clusters with meaningful labels.

Conclusions

We analyzed the Expectation-Maximization algorithm and its applications in clustering, image segmentation, and classification. Our results reflect its strength and limitations. The EM is an efficient tool for finding patterns in unlabeled data; hence, it seems to be well-suited for clustering applications. In image segmentation, it correctly separates the different regions based on similarities of pixels but fails in precision in terms of classifying colors properly, especially in gradual transitions. For a classification application, EM fails since it does not learn class labels directly; most of the time, it comes with uncertain or wrong groupings. These results confirm that despite the importance of EM in unsupervised learning, it may not be a good choice for providing the sharpest decision boundaries or the most accurate classification. It is important in designing such to first consider the strength and weakness of EM in practical use.